Derivation of the small-angle scattering profile of a target biomacromolecule from a profile deteriorated by aggregates. AUC–SAS

An integrated method using analytical ultracentrifugation (AUC) and small-angle scattering (SAS), AUC–SAS, has been developed for the structural analysis of a biomacromolecule in solution. In this study, the first version of AUC–SAS is improved upon so as to be applicable to a solution with a large number of aggregates.


Introduction
Small-angle X-ray and neutron scattering (SAXS and SANS), collectively abbreviated as SAS, are increasingly being used to reveal structures of biomacromolecules in solution (Svergun & Koch, 2003;Bernadó et al., 2018;Mahieu & Gabel, 2018). Modern computational analysis methods for SAS offer a detailed three-dimensional structural model (Grant, 2018;Bengtsen et al., 2020;Grä wert & Svergun, 2020;Matsumoto et al., 2020;Okuda et al., 2021;Shimizu et al., 2022;Yunoki et al., 2022). To build a reliable structural model using these methods, it is crucial to obtain an experimental scattering profile that purely corresponds to the target molecule. However, even with a small content of aggregates (<10%), the scattering profile deteriorates from that of the target molecule and can result in an incorrect structural model. Moreover, there is another serious problem related to aggregates. Typically, an abnormal upturn of the scattering profile in the lowest scattering-angle region is recognized as experimental evidence of aggregate contamination. However, the scattering profile cannot show such clear evidence when the weight fraction of the aggregates is low. For example, the Guinier approximation holds for a sample with a small weight fraction of aggregates, and the scattering profile is expressed as a straight line in the Guinier plot, which gives the gyration radius of the sample biomacromolecule. However, when the gyration radius is larger than the expected radius, it is difficult to determine whether the solution includes aggregates or whether the target molecule itself is deformed from the expected structure. Accordingly, to solve the 'aggregation problems' of the identification and removal of aggregates, SAS coupled with other methods, such as size-exclusion chromatography (SEC-SAXS), has been explored (David & Pé rez, 2009;Ryan et al., 2018;Inoue et al., 2019).
Recently, another integrated approach using analytical ultracentrifugation (AUC) and SAS, abbreviated AUC-SAS (Morishima et al., 2020), has been developed to overcome aggregation problems. AUC-SAS derives a scattering profile of the target molecule in the solution including aggregates by utilizing the molecular distribution obtained with AUC. AUC-SAS reportedly offers precise scattering profiles of several biomacromolecules in solution (Hirano et al., 2021;. Because AUC-SAS does not require a large amount of sample or a very high intensity instrument, as needed by synchrotron-light SAXS, it has the potential to be applied to laboratory-based SAXS. AUC-SAS is also applicable to SANS, which faces the same aggregation problem. Improvement of AUC-SAS will expand the scope of wider applications. For example, the first version of AUC-SAS ('first AUC-SAS') was constrained by the weight fraction of the aggregates (less than $10%). In the present study, we have improved AUC-SAS, making it applicable to samples with relatively large weight fractions of aggregates (>10%). Furthermore, we provide software for the improved AUC-SAS, which is available to any SAS experimenter.

AUC measurements
Sedimentation velocity AUC measurements were performed using ProteomeLab XL-I (Beckman Coulter, USA). The samples were loaded into cells equipped with 1.5 mm path length titanium center pieces (Nanolytics, Germany). All measurements were performed using Rayleigh interference optics at 298 K. The rotor speed was set at 45 000 r min À1 for BSA, AF, Cat, B2-cry and OVA; and 60 000 r min À1 for Lyz and RNaseA. The time evolution of the sedimentation data was analyzed using the multi-component Lamm equation (Lebowitz et al., 2002). The weight-concentration distribution c(s 20,w ) as a function of the sedimentation coefficient and frictional ratio f/f 0 was computed using the SEDFIT software (version 15.01c) (Schuck, 2000). The sedimentation coefficient was normalized to be the value at 293 K in pure water, s 20,w . The weight fraction of the j-mer, r j , was obtained from the corresponding peak area of c(s 20,w ). The molecular weight, M j , of the j-mer was calculated using the corresponding peak positions s 20,w,j and f/f 0 (Brown & Schuck, 2006) as where , , N A and " v v are the viscosity of water at 293 K, the density of water at 293 K, Avogadro's number and the partial specific volume of the protein, respectively.

SAXS measurements
SAXS measurements were performed using a laboratorybased instrument (NANOPIX, Rigaku, Japan) equipped with a high-brilliance point-focused generator of a Cu K source (MicroMAX-007 HFMR, Rigaku, Japan) (wavelength = 1.54 Å ). Scattered X-rays were measured using a HyPix-6000 hybrid photon counting detector (Rigaku, Japan) composed of 765 Â 813 pixels with a spatial resolution of 100 mm. For all samples, the sample-to-detector distance (SDD) was set to 1330 mm, with which the covered q range was 0.01 q 0.20 Å À1 (where q is the magnitude of the scattering vector). Two-dimensional scattering patterns were converted to onedimensional scattering profiles using the SAngler software (Shimizu et al., 2016). After correction by the transmittance and subtraction of buffer scattering, the absolute scattering intensity was obtained using the standard scattering intensity of water (1.632 Â 10 À2 cm À1 ) (Orthaber et al., 2000). All measurements were performed at 298 K.

SEC-SAXS measurements
SEC-SAXS measurements were conducted with a laboratory-based SEC-SAXS system (La-SSS) (Inoue et al., 2019), which is made up of a NANOPIX combined with a Prominence high-performance liquid chromatography system (SHIMADZU, Japan). A Superdex 200 Increase 10/300 GL for BSA, Cat, B2-cry and OVA, a Superose 6 Increase 10/ 300 GL for AF, and a Superdex 75 Increase 10/300 GL for Lyz and RNaseA were utilized as the SEC column. All measurements were performed at a flow rate of 0.02 ml min À1 at 298 K.

SANS measurements
SANS measurements were performed using the SANS-U instrument located at JRR-3 (Japan Atomic Energy Agency, JAEA). A neutron beam at a wavelength of 6.0 Å with 10% resolution was irradiated on the samples. Scattered neutrons were counted using a two-dimensional detector (Ordela, USA). The SDDs were set to 4000 and 1030 mm, which covered a q range of 0.010-0.35 Å À1 . Two-dimensional scattering patterns were converted to one-dimensional scattering profiles using the Red2D software (https://github.com/hurxl/ Red2D). After correction by the transmittance and subtraction of buffer scattering, the absolute scattering intensity was obtained with the standard scattering intensity of H 2 O (0.89 cm À1 ) (Shibayama et al., 2005). All measurements were performed at 298 K.

Methodology
In this section, we explain how to derive the scattering profile of a monomer from that of a solution that includes aggregates by the AUC-SAS method (see x1 of the supporting information for further details), and present the problems in applying the first AUC-SAS to a solution with a high weight fraction of aggregates.

Derivation of the scattering profile of protein monomer from an ensemble-averaged scattering profile
The scattering profile of the monomer and its aggregates, I(q), is represented as where j denotes the association number (1 j n); I j (q), c j and i j (q) are the scattering profile, weight concentration and concentration-normalized scattering profile [i j (q) = I j (q)/c j ] for the j-mer, respectively; and c and r j are the total concentration (c ¼ P j c j ) and weight fraction for the j-mer (r j = c j /c), respectively. Since a j-mer could have diverse configurations, I j (q) indicates the ensemble-average scattering profile of all j-mers. Here, c is low, as the scattering profile is free from the interparticle interference effect.
To solve equation (2) for I 1 (q), the weight fractions of all components, {r j } (j ! 1) (#1), and the scattering profiles of aggregates, i j (q) (j ! 2) (#2), are required. As a prerequisite, highly denatured proteins and high-order aggregates are removed from the sample solution through the purification for a general SAS measurement. Hence, it is reasonable to assume that the residual aggregates are 4-mer at most (j 4) and that the total weight fraction of the aggregates, r a ( 1 À r 1 ), is <0.2. If this prerequisite is not satisfied (i.e. j > 4 and/or r a > 0.2), the sample should be re-purified. Under these conditions, AUC offers information #1 ({r j }) (x2 of the supporting information). Next, to obtain information #2 [i j (q) (j ! 2)], we divided i j (q) into two q regions, i jH (q) and i jL (q), in the sufficiently high and lower q regions, respectively. Here, because there is no difference in the inner local structure between the monomer and the aggregates under the prerequisite conditions (no highly denatured aggregates in the sample). Therefore, I 1H (q) is obtained using I(q) and r 1 as follows (see x1 of the supporting information for further details): where I(q) and r 1 are experimentally offered by SAS and AUC, respectively. On the other hand, extrapolation of equation (3) to the lower q region, i jL (q) ' i 1L (q), does not hold (open magenta circles in Fig. S1 of the supporting information.). Therefore, I 1L (q) is considered as follows. First, the forward scattering intensity, I 1 (0), is obtained with I(0), r j and M j , which are experimentally given by SAS and AUC, as follows (see x1 of the supporting information for further details): The remaining issue is a way to obtain I 1L (q) (q > 0); namely, connecting between I 1H (q) and I 1 (0). The first AUC-SAS (Morishima et al., 2020) connects them with the Guinier formula: where R g1 is the gyration radius of the target monomer. As R g1 is an adjustable parameter, a reasonable I 1L (q) is found by a smooth joint with I 1H (q) at joint point q c . Finally, I 1 (q) is derived from I 1L (q) (q q c ) and I 1H (q) (q > q c ), and the appropriate R g1 is also provided (see x1 of the supporting information for further details). represent the concentration-normalized scattering profiles, i 1 (q) Xtal , calculated from the crystal structure of the BSA monomer (PDB code 4f5s; Bujacz, 2012). Here, i 1 (q) Xtal is identical to that obtained using SEC-SAXS for a BSA solution (Bucciarelli et al., 2018). Fig. 1(d) shows the deviations between the scattering profile derived from the first AUC-SAS, i 1 (q), and that calculated from the crystal structure, i 1 (q) Xtal , i.e. Ái 1 (q)/(q). Here, Ái 1 (q) = i 1 (q) À i 1 (q) Xtal and (q) is the error of i 1 (q). The first AUC-SAS successfully offered reasonable i 1 (q) at r a = 0.06 [Ái 1 (q)/(q) < 1] but produced a large deviation in the middle q region (0.5 qR g1 3) at r a = 0.13 and 0.20 [Ái 1 (q)/(q) > 1]. As a result, R g1 at r a = 0.06 (R g1 = 27.2 AE 0.2 Å ) is consistent with that of the crystal structure (R g1,Xtal = 27.1 Å ), whereas the R g1 s at r a = 0.13 and 0.20 (R g1 = 27.5 AE 0.2 Å and R g1 = 28.1 AE 0.2 Å , research papers respectively) are larger than R g1,Xtal [R g1 and i 1 (0) are listed in Table S1 of the supporting information].

Problems in first AUC-SAS
As shown in Figs. 1(e)-1(h), i 1H (q) [= I 1H (q)/c 1 ], which is given by equation (3), deviated from i 1 (q) Xtal even more in the higher q region than in the Guinier region (1.3 < qR g1 < 3) at r a = 0.13 and 0.20. The large deviations, Ái 1H (q)/(q), at r a = 0.13 and 0.20 in the middle q region make the connection points, q c , shift to the out-of-Guinier region (q c R g1 ' 1.6 and 1.9, respectively). Consequently, incorrect R g1 and scattering profiles were obtained. To solve this problem, the connection should be performed in the Guinier region, that is, I 1H (q) is correctly extrapolated to the inside of the Guinier region. In this study, we have developed a method to correctly extrapolate I 1H (q) and offer a reasonable I 1 (q), even for relatively large r a .

Scattering profile of aggregates
The approximation of i jH (q) ' i 1H (q), which gives I 1H (q) [equation (3)  Arrows indicate the connection points q c between i 1L (q) and i 1H (q). Insets show the enlarged pictures in the range 1.2 qR g1 2.0. (d) Filled blue, green and purple circles show the residuals Ái 1 (q)/(q) for BSA6, BSA13 and BSA20, respectively. Here, Ái 1 (q) = i 1 (q) À i 1 (q) Xtal and (q) denotes the error of i 1 (q). (e)-(g) Open blue, green and purple circles show i 1H (q) [= I 1H (q)/c 1 ] given by equation (3) for BSA6, BSA13 and BSA20, respectively. The black line in each panel represents i 1 (q) Xtal calculated from the crystal structure of the BSA monomer. (h) Open blue, green and purple circles denote the residuals Ái 1H (q)/(q) for BSA6, BSA13 and BSA20, respectively. Here, Ái 1H (q) = i 1H (q) À i 1 (q) Xtal and (q) is the error of i 1H (q). The broken line denotes the upper limit of the Guinier approximation range (qR g1 = 1.3).
I 1H (q) that is correctly extrapolated to the inside of the Guinier region, we carefully reconsidered the scattering profile of an aggregate. First, the concentration-normalized scattering profile of the j-mer, i j (q), is represented as follows: where R k,l and F k,l (q) are the position vectors of the center of mass (COM) and the form factors of the k-or l-th subunit, respectively [ Fig. 2(a)]. The form factor is normalized to be h|F k,l (0)| 2 i = 1, where h . . . i denotes the orientational average. The asterisk (*) denotes the complex conjugate. Next, we assumed that the subunits were randomly arranged in the aggregate. According to the 'decoupling approximation method' (Kotlarchyk & Chen, 1983), the form factor is independent of the position in the aggregate: F k ðqÞF Ã l ðqÞ and exp[Àiq Á (R k À R l )] in equation (6) can be decoupled, as in equation (S3) of the supporting information. Therefore, i j (q) can be expressed as follows (also see x4 of the supporting information): where and T j (q) is the inter-subunit structure factor defined by the Debye function [equation (8)] with the distance between the COMs of the kth and lth subunits, D kl . Considering the random arrangement of the subunits, T j (q) is expressed with the random flight model as equation (10). This model was originally developed for a synthetic polymer chain (Burchard & Kajiwara, 1970) and has been subsequently applied to randomly associated proteins .
where D is the average distance between neighboring subunits (= hD k,k+1 i). Assuming that the gyration radius of a subunit, R g1 , is the effective radius of the subunit, we defined D 2R g1 (see x5 of the supporting information).
(q) indicates the shape anisotropy of the subunit [equation (9)]. Because the form factor of a subunit, F(q), is unknown prior to structural analysis of the monomer, we assumed that the subunit is an ellipsoid whose semi-axes are r and pr (p is the axial ratio), as shown in Fig. 2(b). Its form factor is then represented as follows: where Then and where is the orientation angle between the axis of the ellipsoid and q [ Fig. 2(b)]. (q) was obtained by substituting equations (11)-(14) into equation (9). The axial ratio, p, is estimated using the frictional ratio f/f 0 , which is offered by the AUC measurement (Lebowitz et al., 2002) (see further details in x6 of the supporting information).

Improved AUC-SAS
By substituting equation (7) into equation (2), I 1 (q) is expressed as follows: Schematic illustrations of an aggregate and a subunit. (a) A schematic illustration of an aggregate (j = 4) in which the subunits are randomly arranged. Black points and blue arrows represent the COMs of the subunits and the distances between the COMs of neighboring subunits, respectively. (b) A schematic illustration of the ellipsoidal approximation of a subunit. Blue and red arrows represent the semi-axes. The broken black arrow indicates the scattering vector.
The improved method was demonstrated for BSA and AF solutions with r a = 0.20 (BSA20) and r a = 0.21 (AF21), respectively. Their experimental AUC data are shown in x2 of the supporting information. Fig. 3(a) shows i 1 (q) [= I 1 (q)/c 1 ] which was derived using the first AUC-SAS (purple circles) and improved AUC-SAS (cyan circles) for BSA20. As shown in Fig. 3(b), the deviations Ái 1 (q)/(q) for the improved Demonstration of the first and improved AUC-SAS for BSA20 and AF21. In all panels, purple and cyan circles represent the results of the first AUC-SAS and improved AUC-SAS, respectively. Concentration-normalized scattering profiles, i 1 (q), for (a) BSA monomer and (e) AF 24-mer. Residuals Ái 1 (q)/(q) for (b) BSA20 and ( f ) AF21. Guinier plots of i 1 (q) for (c) BSA20 and (g) AF21. Solid purple and cyan lines express the least-squares fitting lines with the Guinier formula. Pair distance distribution functions, P 1 (r), for (d) BSA20 and (h) AF21.
AUC-SAS were sufficiently small [Ái 1 (q)/(q) < 1] in the entire q region. As shown in Figs. 3(c) and 3(d) and Table 1, the improved AUC-SAS yielded more reasonable structural parameters [R g1 , i 1 (0), P 1 (r) (pair distance distribution function) and D max ] than the first AUC-SAXS. For the larger protein, AF solution (AF21), the improved AUC-SAS successfully gave reasonable i 1 (q) and structural parameters [Figs. 3(e)-3(h) and Table 1]. Thus, the improved AUC-SAS was applicable to a solution with a relatively large r a ( 0.2), which is the general condition for most SAS measurements.
Furthermore, we demonstrated the improved AUC-SAS for various proteins with different shapes and sizes (AUC results of the samples are shown in x2 of the supporting information). As shown in Fig. 4 and Table 2, the scattering profiles i 1 (q) and structural parameters [R g1 and i 1 (0)] offered by the improved AUC-SAS are consistent with those of SEC-SAXS for these proteins at various r a ( 0.2).
AUC-SAS is applicable to SANS, which faces the same aggregation problem, as well as SAXS. We examined the AUC-SANS for a BSA solution (BSA3) using the improved AUC-SAS (x7 of the supporting information). For the SANS data of BSA3, the improved AUC-SAS successfully offered a reasonable scattering profile and gyration radius (R g1 = 26.5 AE 0.2 Å ) that were consistent with those of the crystal structure (R g1,Xtal = 26.7 Å ). For neutron facilities without a SEC-SANS system (Jordan et al., 2016;Johansen et al., 2018;Sato et al.,  Open red and filled blue circles show the scattering profile i 1 (q) given by SEC-SAXS and improved AUC-SAS, respectively, for (a) BSA20, (b) AF21, (c) Cat8, (d) Lyz6, (e) B2-cry11, ( f ) OVA4 and (g) RNaseA8. 2021), AUC-SANS is the most promising method for obtaining the aggregation-free scattering profile.
In x8 and x9 of the supporting information, we evaluate the maximum errors originated by the random flight model and ellipsoidal approximation. The error in I 1 (q) is several per cent at most, even though the extreme cases are assumed.
It is often worthwhile analyzing the structure of the aggregate (Kovalchuk et al., 2019). Programs such as SASREFMX and OLIGOMER in the ATSAS package (Petoukhov et al., 2012;Manalastas-Cantos et al., 2021) are well known for modeling of aggregates. However, these programs require the structure of the monomer. Hence, the complementary use of AUC-SAS and these programs is a promising strategy.
Implementing the improved AUC-SAS, Igor Pro-based software (Kline, 2006) has been developed for the utilization of AUC-SAS by SAS experimenters. The required information is the data set of molecular weights (or association number), weight fractions and the frictional ratio, which are given by AUC. The scattering profile of the target monomer is obtained just by inputting the AUC information and SAS profile for the solution. The software is available at https:// www.rri.kyoto-u.ac.jp/NSBNG/activity.html.
Its usage is described in x10 of the supporting information.

Related literature
The following additional references are only cited in the supporting information for this article: Perkins (2001), Perrin (1934), Pierce et al. (2014). Table 1 Gyration radii, forward scattering intensities, molecular weights calculated from forward scattering intensities, and maximum pair distances for BSA20 and AF21. R g and i(0): gyration radius and concentration-normalized forward scattering intensity for non-treated SAXS, respectively. R g1 and i 1 (0): gyration radius and concentration-normalized forward scattering intensity of the monomer, respectively, which were derived using AUC-SAS. M: molecular weight calculated from the forward scattering intensity. D max : maximum pair distance from P 1 (r). The error of the gyration radius is the standard deviation. The errors of the concentration-normalized forward scattering intensity and molecular weight were calculated from the standard deviations of the forward scattering intensity and concentration.

Table 2
Gyration radii and forward scattering intensities given by AUC-SAXS (improved AUC-SAXS) and SEC-SAXS for various proteins.
R g1 and i 1 (0): gyration radius and concentration-normalized forward scattering intensity, respectively. The error of the gyration radius is the standard deviation. The error of the concentration-normalized forward scattering intensity was calculated from the standard deviations of the forward scattering intensity and concentration.